6 research outputs found
Look at the First Sentence: Position Bias in Question Answering
Many extractive question answering models are trained to predict start and
end positions of answers. The choice of predicting answers as positions is
mainly due to its simplicity and effectiveness. In this study, we hypothesize
that when the distribution of the answer positions is highly skewed in the
training set (e.g., answers lie only in the k-th sentence of each passage), QA
models predicting answers as positions can learn spurious positional cues and
fail to give answers in different positions. We first illustrate this position
bias in popular extractive QA models such as BiDAF and BERT and thoroughly
examine how position bias propagates through each layer of BERT. To safely
deliver position information without position bias, we train models with
various de-biasing methods including entropy regularization and bias
ensembling. Among them, we found that using the prior distribution of answer
positions as a bias model is very effective at reducing position bias,
recovering the performance of BERT from 37.48% to 81.64% when trained on a
biased SQuAD dataset.Comment: 13 pages, EMNLP 202
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Questions in open-domain question answering are often ambiguous, allowing
multiple interpretations. One approach to handling them is to identify all
possible interpretations of the ambiguous question (AQ) and to generate a
long-form answer addressing them all, as suggested by Stelmakh et al., (2022).
While it provides a comprehensive response without bothering the user for
clarification, considering multiple dimensions of ambiguity and gathering
corresponding knowledge remains a challenge. To cope with the challenge, we
propose a novel framework, Tree of Clarifications (ToC): It recursively
constructs a tree of disambiguations for the AQ -- via few-shot prompting
leveraging external knowledge -- and uses it to generate a long-form answer.
ToC outperforms existing baselines on ASQA in a few-shot setup across the
metrics, while surpassing fully-supervised baselines trained on the whole
training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at
https://github.com/gankim/tree-of-clarifications.Comment: Accepted to EMNLP 202
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
In this paper, we introduce CheXOFA, a new pre-trained vision-language model
(VLM) for the chest X-ray domain. Our model is initially pre-trained on various
multimodal datasets within the general domain before being transferred to the
chest X-ray domain. Following a prominent VLM, we unify various domain-specific
tasks into a simple sequence-to-sequence schema. It enables the model to
effectively learn the required knowledge and skills from limited resources in
the domain. Demonstrating superior performance on the benchmark datasets
provided by the BioNLP shared task, our model benefits from its training across
multiple tasks and domains. With subtle techniques including ensemble and
factual calibration, our system achieves first place on the RadSum23
leaderboard for the hidden test set.Comment: Published at BioNLP workshop @ ACL 202
Towards More Realistic Generation of Information-Seeking Conversations
In this paper, we introduce a novel framework SimSeek (simulating
information-seeking conversation from unlabeled documents) and compare two
variants of it to provide a deeper perspective into the information-seeking
behavior. We first introduce a strong simulator for information-symmetric
conversation, SimSeek-sym, where questioner and answerer share all knowledge
when conversing with one another. Although it simulates reasonable
conversations, we take a further step toward more realistic information-seeking
conversation. Hence, we propose SimSeek-asym that assumes information asymmetry
between two agents, which encourages the questioner to seek new information
from an inaccessible document. In our experiments, we demonstrate that
SimSeek-asym successfully generates information-seeking conversations for two
downstream tasks, CQA and conversational search. In particular, SimSeek-asym
improves baseline models by 1.1-1.9 F1 score in QuAC, and by 1.1 of MRR in
OR-QuAC. Moreover, we thoroughly analyze our synthetic datasets to identify
crucial factors for realistic information-seeking conversation.Comment: 10 pages preprin